Efficient XML Keyword Search Based on DAG-Compression
نویسندگان
چکیده
In contrast to XML query languages as e.g. XPath which require knowledge on the query language as well as on the document structure, keyword search is open to anybody. As the size of XML sources grows rapidly, the need for efficient search indices on XML data that support keyword search increases. In this paper, we present an approach of XML keyword search which is based on the DAG of the XML data, where repeated substructures are considered only once, and therefore, have to be searched only once. As our performance evaluation shows, this DAG-based extension of the set intersection search algorithm[1], [2], can lead to search times that are on large documents more than twice as fast as the search times of the XML-based approach. Additionally, we utilize a smaller index, i.e., we consume less main memory to compute the results.
منابع مشابه
Keyword Search on DAG-Compressed XML Data
With the growing size of publicly available XML document collections, fast keyword search becomes increasingly important. We present an indexing and keyword search technique that is suitable for DAGcompressed data and has the advantage that common subtrees have to be searched only once. We also present a performance evaluation that shows that our DAGcompressed index and search technique is supe...
متن کاملSpiderX: Fast XML Exploration System
Keyword search in XML has gained popularity as it enables users to easily access XML data without the need of learning query languages and studying complex data schemas. In XML keyword search, query semantics is based on the concept of Lowest Common Ancestor (LCA), e.g., SLCA and ELCA. However, LCA-based search methods depend heavily on hierarchical structures of XML data, which may result in m...
متن کاملEfficient XML Keyword Search: From Graph Model to Tree Model
Keyword search, as opposed to traditional structured query, has been becoming more and more popular on querying XML data in recent years. XML documents usually contain some ID nodes and IDREF nodes to represent reference relationships among the data. An XML document with ID/IDREF is modeled as a graph by existing works, where the keyword query results are computed by graph traversal. As a compa...
متن کاملICRA: Effective Semantics for Ranked XML Keyword Search
Keyword search is a user-friendly way to query XML databases. Most previous efforts in this area focus on keyword proximity search in XML based on either tree data model or graph (or digraph) data model. Tree data model for XML is generally simple and efficient for keyword proximity search. However, it cannot capture connections such as ID references in XML databases. In the contrast, technique...
متن کاملExploiting ID References for Effective Keyword Search in XML Documents
In this paper, we study novel Tree + IDREF data model for keyword search in XML. In this model, we propose novel Lowest Referred Ancestor (LRA) pair, Extended LRA (ELRA) pair and ELRA group semantics for effective and efficient keyword search. We develop efficient algorithms to compute the search results based on our semantics. Experimental study shows the superiority of our approach.
متن کامل